AITopics | communication-efficient sgd

Collaborating Authors

communication-efficient sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CSER: Communication-efficient SGD with Error Reset

Neural Information Processing SystemsDec-24-2025, 07:33:41 GMT

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.

communication-efficient sgd, cser, underline, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Neural Information Processing SystemsNov-21-2025, 15:32:58 GMT

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always guarantee convergence, and it is not clear whether they can be improved. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes for gradient updates which provides convergence guarantees. QSGD allows the user to smoothly trade off \emph{communication bandwidth} and \emph{convergence time}: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For example, on 16GPUs, we can train the ResNet152 network to full accuracy on ImageNet 1.8x faster than the full-precision variant.

communication-efficient sgd, gradient quantization, gradient quantization and encoding, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Communication-efficient SGD: From Local SGD to One-Shot Averaging

Neural Information Processing SystemsJan-19-2025, 05:03:35 GMT

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among N workers, who can take SGD steps and coordinate with a central server. While it is possible to obtain a linear reduction in the variance by averaging all the stochastic gradients at every step, this requires a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism.The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs \Omega ( \sqrt{T}) communications for T local gradient steps in order for the error to scale proportionately to 1/(NT), this has been successively improved in a string of papers, with the state of the art requiring \Omega \left( N \left( \mbox{ poly} (\log T) \right) \right) communications. In this paper, we suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows.

communication, communication-efficient sgd, local sgd, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.63)

Add feedback

CSER: Communication-efficient SGD with Error Reset

Neural Information Processing SystemsOct-10-2024, 19:29:52 GMT

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. The key idea in CSER is first a new technique called error reset'' that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10\times for CIFAR-100, and by 4.5\times for ImageNet.

communication-efficient sgd, cser, underline, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Reviews: QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Neural Information Processing SystemsOct-7-2024, 23:47:07 GMT

Update: I decrease slightly the grade due to the mismatch between theoretical and practical results that could be better covered. Still this paper has strong experimental results and some theoretical results. I would encourage the authors to improve on the gap between the two. In this paper the author introduce Quantized SGD (QSGD), a scheme for reducing the communication cost of SGD when performing distributed optimization. The quantization scheme is useful as soon as one has to transmit gradients between different machines.

artificial intelligence, book review, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Alistarh, Dan, Grubic, Demjan, Li, Jerry, Tomioka, Ryota, Vojnovic, Milan

Neural Information Processing SystemsFeb-14-2020, 08:41:46 GMT

communication-efficient sgd, gradient quantization and encoding, qsgd, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

Dai, Xinyan, Yan, Xiao, Zhou, Kaiwen, Yang, Han, Ng, Kelvin K. W., Cheng, James, Fan, Yu

arXiv.org Machine LearningNov-25-2019

The high cost of communicating gradients is a major bottleneck for federated learning, as the bandwidth of the participating user devices is limited. Existing gradient compression algorithms are mainly designed for data centers with high-speed network and achieve $O(\sqrt{d} \log d)$ per-iteration communication cost at best, where $d$ is the size of the model. We propose hyper-sphere quantization (HSQ), a general framework that can be configured to achieve a continuum of trade-offs between communication efficiency and gradient accuracy. In particular, at the high compression ratio end, HSQ provides a low per-iteration communication cost of $O(\log d)$, which is favorable for federated learning. We prove the convergence of HSQ theoretically and show by experiments that HSQ significantly reduces the communication cost of model training without hurting convergence accuracy.

communication cost, federated learning, gradient, (11 more...)

arXiv.org Machine Learning

1911.04655

Genre: Research Report (0.64)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback